A New approach for Classification of Highly Imbalanced Datasets using Evolutionary Algorithms
نویسندگان
چکیده
Today’s most of the research interest is in the application of evolutionary algorithms. One of the examples is classification rules in imbalanced domains. The problem of Imbalanced data sets plays a major challenge in data mining community. In imbalanced data sets, the number of instances of one class is much higher than the others, and the class of fewer representatives is of more interest from the point of the learning task. Traditional Machine Learning algorithms work well with balanced data sets, but not able to deal with classification of imbalanced data sets. In the present paper we use different operators of Genetic Algorithms (GA) for over-sampling to enlarge the ratio of positive samples, and then apply clustering to the over-sampled training dataset as a data cleaning method for both classes, removing the redundant or noisy samples. The proposed approach was experimentally analyzed and the experimental results shows an improvement in the classification measured as the area under the receiver operating characteristics (ROC) curve.
منابع مشابه
Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملارائهروش جدید مبتنیبر برنامهنویسی ژنتیک برای وزندهی قوانین فازی در طبقهبندی نامتوازن
In classification problems, we often encounter datasets with different percentage of patterns (i.e. classes with a high pattern percentage and classes with a low pattern percentage). These problems are called “classification Problems with imbalanced data-sets”. Fuzzy rule based classification systems are the most popular fuzzy modeling systems used in pattern classification problems. Rule weights...
متن کاملSpectral-spatial classification of hyperspectral images by combining hierarchical and marker-based Minimum Spanning Forest algorithms
Many researches have demonstrated that the spatial information can play an important role in the classification of hyperspectral imagery. This study proposes a modified spectral–spatial classification approach for improving the spectral–spatial classification of hyperspectral images. In the proposed method ten spatial/texture features, using mean, standard deviation, contrast, homogeneity, corr...
متن کاملCUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...
متن کاملAn Evolutionary Multi-objective Discretization based on Normalized Cut
Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operat...
متن کامل